** Data reading and variable selection from raw data
** Chinese Longitudinal Healthy Longevity Survey (CLHLS) 1998

** 01. Reading data **

cap log close
clear all
set more off
cd /*insert you work directory here*/
use /*read your data here*/  


** 02. Consructing year and country variables **

ge year=1998
lab var year "survey year"

ge country=156
lab var country "ISO country code"
//China: 156 (ISO Country Codes) 


** 03. ID variables **

ge pid=ID
lab var pid "person id"


** 04. Basic Demographics (Sex and Age/birth year) **

ge sex=A1
lab var sex "sex"
lab def sex 1 "male" 2 "female"
lab val sex sex

ge age=1998-V_BTHYR
lab var age "age"

ge birthyr=V_BTHYR
lab var birthyr "year of birth"


** 05. Siblings **

ge nsibs=F9
lab var nsibs "number of siblings"
lab def nsibs 99 "missing"
lab val nsibs nsibs

ge nbro=0
ge nsis=0
local vlist A B C D E F G H I J K L M N O
foreach var of local vlist {
	replace nbro=nbro+1 if F92`var'2==1
	replace nsis=nsis+1 if F92`var'2==2
}
lab var nbro "number of brothers"
lab var nsis "number of sisters"

** 06. Own education **

//only available in years, 66.62% spent 0 year at school
rename F1 eduy
lab var eduy "years of schooling"


** 07. Parents' education: Father and/or Mother **

//parents education not available


** 08. Own occupation **

//before age 60
rename F2 occ
lab var occ "main occupation before age 60"


** 09. Parents' occupation **

//only father's job before age 60 is available
rename F84 faocc
lab var faocc "father occupation before age 60"


** 10. Tabulate the Identified Variables **

log using /*insert you work directory here*/, replace text

** Data reading and variable selection from raw data
** Chinese Longitudinal Healthy Longevity Survey (CLHLS) 1998

** Sex **
tab sex

** Age, Birth Year **
sum age birthyr, d

** Siblings **
sum nsibs nbro nsis, d

** R's Own Education **
tab1 eduy

** Parental Education **
//NA

** R's Own Occupation **
tab occ

** Parental Occupation **
tab faocc

log close

** 11. Keep the identified variables only

keep year country pid sex age birthyr ///
	 nsibs nbro nsis ///
	 eduy occ faocc


** 12. Save the Data File **

saveold /*insert you work directory here*/, replace



** 13. Homoginising education**
** Own Education **
rename eduy educ_yrs
//own education local categories not available
replace educ_yrs=. if educ_yrs==99

ge educ_ISCED=020 if educ_yrs<6
replace educ_ISCED=100 if educ_yrs>=6 & educ_yrs<9
replace educ_ISCED=244 if educ_yrs>=9 & educ_yrs<12
replace educ_ISCED=344 if educ_yrs>=12 & educ_yrs<15
replace educ_ISCED=554 if educ_yrs>=15 & educ_yrs<19
replace educ_ISCED=667 if educ_yrs>=19 & educ_yrs<22
replace educ_ISCED=767 if educ_yrs>=22
lab var educ_ISCED "respondent highest education in ISCED code"

** Parents Education **
//parents education not available


** 14. Homoginising sibling (by Manting Chen, 2017/4/3)**
//cutoff
ge nbro_flag=99
lab var nbro_flag "cutoff of number of brothers"
ge nsis_flag=99
lab var nsis_flag "cutoff of number of sisters"
ge nsibs_flag=99
lab var nsibs_flag "cutoff of total number of siblings"

lab def nsib_flag 99 "no cutoff"
lab val nbro_flag nsis_flag nsibs_flag nsib_flag

//recode missing
replace nsibs=. if nsibs==99

** 15. Tab Education and Sibling Variables **
tab1 sex age birthyr
tab1 educ_yrs 
tab1 nbro nsis nsibs nbro_flag nsis_flag nsibs_flag


** 16. Save the Data File **

saveold /*insert you work directory here*/, replace

